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NEWTON-TYPE METHODS FOR REML ESTIMATION IN 
GENETIC ANALYSIS OF QUANTITATIVE TRAITS 

KATERYNA MISHCHENKO, SVERKER HOLMGREN, AND LARS RONNEGARD 

■ Abstract. Robust and efficient optimization metliods for variance compo- 
O ■ nent estimation using Restricted Maximum Lifceiitiood (REML) modeis 
^ I for genetic mapping of quantitative traits are considered. We siiow tiiat 

' tlie standard Newton- AI sctieme may faii wtien ttie optimum is iocated at 

Q ■ one of ttie constraint boundaries, and we introduce different approaclies 

^ ! to remedy tliis by taking tlie constraints into account. We approximate tlie 

^ I fiessian of tlie objective function using the average information matrix and 

,—1 ■ also by using an inverse BFGS formula. The robustness and efficiency is 
evaluated for problems derived from two experimental data from the same 

^! animal populations. 

O: 
d : 

^ • 1. Introduction 

' — One of the goals in modern computational genetics is to locate regions in 

^ ; the genome underlying quantitative traits. This is performed by mapping 

Q\ ; of quantitative trait loci (QTL), which is a procedure involving statistical 

^ ; analysis of data sets derived from experimental populations. QTL mapping 

CN ; is based on the idea of relating phenotypic and marker genotype informa- 

^ ! tion. The QTL are regions on the genome where the genetic marker infor- 

! mation and the phenotypic values show strong co-variation. We focus on 

■ experimental crosses where animals from two divergent breeds have been 

■ mated for two generations producing a large number of grand-offspring. 
*^ • In its simplest form, QTL analysis assumes that all animals within the 

■ - - ' two divergent breeds show no genetic variation with the founder breeds 



( [Haley and Knott, 1992[ ), which is motivated by the expectation of having 
most of the genetic variation between breeds. There may be substantial 
genetic variation within breeds, which may be taken into account in a vari- 



ance component QTL model ( [Fernando and Grossman, 1989t Goldgar, 1990 



rd and Carlborg, 2007). This is a mixed linear model with fixed non-QTL 
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effects and a random QTL effects. Here, either Maximum Likelihood (ML) 
or Restricted Maximum LikeUhood (REML) methods are used and the 
QTL are given by the regions having highest UkeHhood ratio statistic. 

ML maximizes parameter values for both fixed and random effects si- 
multaneously, whereas REML maximizes only the portion of the likeli- 
hood that does not depend on fixed effects. An introduction to ML and 



REML schemes applied to QTL mapping problems is given e.g. in (Lynch and Walsh, 1998) 



and Chapter 23 of ( [Hill, 1989D . 



In this paper we develop and assess efficient and robust computational 
procedures for solving REML estimation problems in QTL mapping set- 
tings where variance component models are used. 

2. Linear Mixed Models and REML Estimation 
A general linear mixed model is given by 

(1) y = Xb + Zu + e 

where }^ is a vector of n observations, X is the nxrif design matrix for n j 
fixed effects, Z is the nxrir design matrix for Uu for random effects, b is the 
vector of rif unknown fixed effects, u is the vector of Uu unknown random 
effects, and e is a vector of n residuals of random effects. 

The additional assumptions for the QTL analysis setting are that ele- 
ments of e are identically and independently distributed and that there is a 
single observation for each individual. In this paper, we focus on the case 
where the model includes a single random effect. The covariance matrix 
for ([I]) then becomes 

(2) V = o-lli + o-ll 

where cr^ is the variance of the random effect and cr^ is the residual vari- 
ance. The matrix IT is referred to as the Identity-By-Descent (IBD) matrix. 

In REML estimation, the task at hand is to determine estimates of cr^ and 
o"g in ([B as well as the value of the likelihood function / at these points. At 
least two approaches can be used for this. One standard scheme is based 
on computing the estimates from the Mixed-Model Equations (MME) in- 
troduced in ( [Henderson, 196 3D. However, this approach requires that the 



IBD matrix is sparse and nonsingular, which is normally not the case 
in the QTL analysis problems. Instead, we use an alternative approach 
which is also used in the standard software available for REML models for 
QTL mapping problems. Comparison of these two approaches is given in 



(Lee and van der Werf, 2006). Here, 
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the parameters cr^, cr^ are obtained as maximizers of the restricted Uke- 
Uhood of the observed data _y. 

The log-UkeUhood function for the REML estimation approach based on 
model © is 



(3) L = -2ln{l) = C + ln(det(V)) + ln(det(X^ V'^ X)) + yP^y, 

where / is the HkeUhood function to be maximized and the projection ma- 
trix P is defined by 

(4) p = y-i - v'^x(x^v'^xy^x^v'K 

The function L is a function of two variables, 

L{i:) = L{(rl,crl) = L{(ru(r2), 

and the problem of maximizing the likelihood function / is equivalent to 
the problem of the minimizing the log-likelihood function L, which has a 
simpler representation. In summary, we determine the estimates of cr^, cr^ 
by solving the optimization problem 

(5) min L{(T\,(T2) 

(6) s.t. CTi > 

(7) cr2 > 

The generalized least square estimates of the fixed effect b may be com- 
puted by first solving the optimization problem © - ([7]) and then comput- 
ing b using 

(8) 'b = {x^y-^xy^x^v-^y. 

where (T\,(T2 in V are the optimizers for the problem ([5]) - d?]). 

3. A Brief Review of Optimization Methods for Maximum Likelihood 

Computations 

Several methods have been used for solving the optimization problems 
arising from maximum-likelihood estimation schemes. The algorithms 
can be classified in several groups, e.g. derivative-free, derivative-based 
(Newton-like), and expectation-maximization (EM) methods, method of 
successive approximations (MSA). These schemes can also be combined in 



different ways. A review is given in ( |Druet and Ducrocq, 2006^ [Harville, 1977 
Hill, 1989t[I^nch and Walsh, 1998D . 

The choice of method for maximizing the likelihood function is affected 
mainly by two factors: Firstly, the computation of the likelihood and its 
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derivatives is computationally demanding, which means that the optimiza- 
tion algorithm should require a small amount of such evaluations in each 
step. Secondly, the optimization algorithm should be efficient and robust, 
which means that it should converge in a small number of iterations and 
the performance should not depend critically on the initial values and the 
properties of the objective function for the specific problem. 



3.1. Derivative-free methods. Derivative-free schemes only employ eval- 
uations of the objective function. Examples are the Nelder-Mead downhill 
simplex method dNelder and Mead, 1964D and the quasi-Newton method 
with finite-difference approximation of the derivatives. For REML prob- 
lems, evaluation of the log-likelihood is rather costly, and the Quasi-Newton 
method with finite-difference approximation is not a very efficient scheme. 
However, it has been shown that the Quasi-Newton method has better 
convergence properties than the downhill simplex method, and the latter 
method can also exhibit non-robust behavior when the initial point is close 
to the maximum ([Meyer, 1989|). 



3.2. Methods using derivative information. The standard schemes for 
optimization in REML schemes use derivative information. These methods 
are based on solution of the nonlinear equation 



(9) 



DLCZ) = 0, 



where DL = (^, ^) is the gradient vector of the log-likelihood func- 
tion. The components of the gradient are expressed in terms of the matrices 
V and P and the variance component parameters cr\^2 using 



(10) 



OCTi dCTi dCTi 



i= 1,2. 



For REML estimation problems, it has been shown that Newton-type meth- 
ods are quite efficient dCallanan and Harville, 1991i [Harville, 1977t [Johnson and Thompson, 
The EM method introduced in (^ Dempster et al., 1977[ ), which is often used 
in general maximum likelihood settings, is not guaranteed to convergence 
to the true minimum and requires more iterations than Newton-type meth- 
ods ([ Lynch and Walsh, 1998[ ). However, the modifications of this method 
were shown to be efficient, see e.g. ( [Callanan and Harville, 1991i [Thompson and Meyer, 198 
The standard Newton method is defined by the iteration 



(11) 



- a^[//(i:^)]"Hz)L(i:^)], 
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with q;^ = 1 and //(E^) = //^ is the Hessian of the log-UkeUhood function, 
which for a REML problem is given by 



dV dV r dV dV 

OCTi OCTj OCTi OCTj 



iJ = 1,2. 



The true Hessian is expensive to evaluate (especially the first term), and 
two approximations have been used: 



(1) Fisher's method of scoring: The Hessian in (|TT|) is substituted by 
it's expected value: //^ — > E(H^) - -F^, where is the Fisher 
information matrix, see e.g. ( [Thompson, 1973| ). 

(2) Average Information (AI) method: The Newton-AI scheme is a 



standard method for solving REML problems, used e.g. in ( [Johnson and Thompson, 1 
Here, the Hessian in (fTTj) is substituted by the so called average 
information matrix (see ( [Gilmour etal., 1995| )): H'^ — > \{H^ + 
E(H'')) = AlK The AI matrix is given by 



(13) 

1 d^L 
AI(o-i,o-j) = -(- 



+ E(- 



■)) = y p- 



dV dV 



-Py, ij=l,2. 



2 dcTidcTj dcTidcTj dcrt dcrj 

As an initial step in this study, the Newton-AI method, as described in 
( [Johnson and Thompson, 1993] ), was implemented and tested. The results 
show that this method results in good performance for cases where the 
maximum is inside the region restricted by the constraints. However, for 
problems where the maximum is at or close to the constraints, the results 
presented in Section 15.11 show that the method may break down since the 
constraints are violated. 

In general, (|5]) - ([7]) presumably should be solved using some optimiza- 
tion method that takes the non-negativity constraints for the variance com- 
ponent parameters into account, see e.g. ( [Callanan and Harville, 1991i 
Harville, 1977t [Meyer and Smith, 1996[). Also, the optimization problem 



may in some cases be non-convex in parts of the domain and/or the ob- 
jective function may be very flat in one direction. In the next section, 
we present some methods which take these properties of the optimization 
problem into account. 



4. New Optimization Procedures for REML Estimation 

We present three difl'erent methods: The standard Newton-AI method 
enhanced with a line search scheme that takes the constraints into account. 
A quasi-Newton scheme where the same type of line search scheme is 
included and the scheme is also modified to deal with non-convex parts of 
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the objective function, and finally an active- set method where the treatment 
of the constraints is built into the algorithm. 



4.1. An enhanced Newton- AI algorithm. Standard unconstrained opti- 
mization methods can be modified to take simple constraints into account 
by introducing a line search scheme which prevents constraint violation, 
and has been suggested for application in REML estimation dHarville, 1977^ 



J. Jensen and Thompson, 1997). We use a simple line search procedure 



where the conditions & and d?]) are initially checked for the full-length 
step Qf^ = 1, and if the constraints are not fulfilled the step length is re- 
duced by a factor of two. Then the constraints are checked again and the 
step length is reduced further if needed. The Newton iteration is termi- 
nated if the relative step length is smaller than some pre-set parameter, 
e.g. 10"^. This means that if an optimum on the boundary is found, the 
line search plays a key role and the line search termination criterion stops 
the iteration. 

Generally, the line search procedure maybe used as well to avoid the 
"overshooting" the minimum or to enforce a decrease of the log-likelihood 
function. To this purpose the described above line search technique was 
used e.g. in ( [Jennrich and Sampson, 1976| ). Moreover, there are more effi- 
cient line search techniques such as Armijo or Wolf line searches. The line 
searches aimed to reducing the value of the objective function require addi- 
tional function evaluations, which is very computationally demanding, so 
undesirable. Moreover, in our algorithm we skip the direct function evalu- 
ations. That is why, in our study we use the line search only for feasibility 
checking, since this does not require any additional function evaluation. 



4.2. An Enhanced BFGS-Quasi-Newton Method. An obvious candi- 
date for solving the REML optimization problem is the quasi-Newton (QN) 
method, where the Hessian is adaptively approximated instead of using 
e.g. the average information matrix as in the Newton-AI scheme. The 
QN-iteration is given by 

/+! = -[^(i:^)]-i[DL(i:^)r 

where H(ll^) = ffi is the approximation of the Hessian at iteration k. A 
reason for using this method is that it is cheaper to update the approxima- 
tion of the Hessian than to compute the AI matrix. 
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There are several updating formulas available for the approximative Hes- 



sian in a QN scheme, see e.g. (Nocedal and Wright, 1999). We use a in 



verse BFGS (Broyden-Fletcher-Goldfarb-Shanno) formula, where an ap- 
proximation of the inverse of the Hessian 5^ = (Z/^)"^ is updated in each 
iteration. In the QN scheme, the gradient is calculated using (fTO|) . Us- 
ing a finite difference approximation would be inefficient, since it requires 
evaluation of the log-likelihood which is computationally expensive. 

The inverse BFGS formula produces a positive definite Hessian approx- 
imation matrix if the curvature condition 



(15) (Ai:^)^A(DL^) > 



holds. Here A(DL^) = {DLf^^ - {DLf. However, this condition does 



not hold if the objective function is non-convex. In this case (|T5|) can 



be enforced explicitly, by imposing restrictions on the step length in the 



line search procedure, see (Nocedal and Wright, 1999). For the REML 



optimization problems, the recursive reduction of the step-length in the 
standard line search algorithms such as e.g. Armijo line search should be 
avoided since they involve evaluations of the log-likelihood. A standard 
approach to avoid problems caused by an indefinite approximation of in- 
verse Hessian is to skip the updating and use 



(16) - & 



when (fT5|) is not fulfilled. However, we found that for the QTL mapping 
problems this type of algorithm sometimes failed since using (fT6]) ignores 
a lot of information about the real curvature of the function. Instead, we 
propose to use the inverse AI matrix as the next approximation of the in- 



verse of the Hessian if condition ( 15 ) is not satisfied. The results in Section 



|5] show that this gives an efficient algorithm. Also, the performance of the 
QN scheme is enhanced if the initial value of the inverse Hessian matrix is 
set to the inverse of the AI matrix. 

The modified inverse BFGS procedure is given by the following algo- 
rithm: 
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Algorithm 1 Modified inverse BFGS updating formula 



s = 1^+^ - 

-k+l xijk 



if (k=0) OR (^^ • 3^ < 0) then 

(17) 5^ = Ar^ 
else 

P = 7~s 

(18) B' = (I-p-s-y'^)- B'-' ■{I-p-ys')+p-s-s' 
end if 

The line search procedure described above, ensuring that the constraints 
are fulfilled, is used in the same way in the QN method as in Newton- AI 
method. 

4.3. An Active-Set Method. The active-set method is a Newton-type method 



for constraint problems, see e.g. ( |Nash and Sofer, 1996D . The constraints 



are automatically satisfied in each iteration, and the gradient and Hessian 
are calculated in a reduced space: If A is the matrix of gradients of active 
constraints and N is the null space of the matrix A, the reduced gradient 
and Hessian are given by N^DL and N^HN. The iterative scheme for this 
method is: 

^^^^ H'^' = + a' - p'^' 

The optimality condition is checked by examining of Lagrangian multipli- 
ers A at the point of potential optimum S*: 

(20) A = AIdL{T) 
where A^ is right inverse of matrix A computed as 

(21) Ar = A^[AA^r^ 



The active- set strategy for the REML was implemented in dCallanan and Harville, 19911 ) 
as well. In our study we approximate the Hessian both using the AI ma- 
trix and the inverse BFGS formula (the Hessian //^ is approximated by 
(5^)~^). In this case we use a line search procedure where the current step 
is, if needed, reduced so that the next point lies exactly on the relevant con- 
straint. The step length is controlled by the pre-set parameter 10~^ and 
iterations terminate if the current step length is smaller than this value. 
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If no constraints are active, the active-set and corresponding unconstrained 
Newton methods (AI and QN) are equivalent, and the same sequence of it- 
erations are generated. 

The numerical procedure for all methods are summarized in algorithm 
described in the Algorithm [2l 

5. Numerical Experiments 

In this section, we present numerical experiments performed for two rep- 
resentative sets of experimental data that come from the same population. 
We compute the optimal values of the variance of a single random effect 
(riu = 1) and the residual, i.e. we solve two-dimensional optimization prob- 
lems for cTi 2 under the constraints that cri > 0,cr2 > 0. The population 
size is n = 767 (which is quite typical in QTL analysis) and there is a 
single fixed effect (uf = 1). For data set 1, the optimum is distinctly de- 
fined and located inside the feasible search domain, the optimal values are 
cTj = 4868 and 0-2 = 20644. For data set 2, the optimum is found at one of 
the constraint boundaries, the optimal values are cri = and 0-2 = 29681. 
Also, the objective function is non-convex and very flat at the optimum in 
the cr2-direction, making the computed optimal value of 0-2 very sensitive. 

In practice the log-likelihood is evaluated with the accuracy of 4 - 5 
decimals. In our implementation we skipped the evaluation of the log- 
likelihood to make our computations cheaper. As a termination criterium 
the magnitude of the gradient/reduced gradient were used, these quantities 
are computed by as a part of the algorithm, so they do not require extra 
computational work. 

5.1. The Standard Newton- AI Method. We begin by showing results 
for a case where the standard Newton- AI method fails. For this method, 
we use the termination criterion ||Z)L||2 < 10~^. In Figure [H the values of 
the objective function, the norm of gradient, and the variance components 
are shown as functions of the iteration number in the Newton scheme for 
data set 2. For this problem, the optimum is found at the constraint cri = 0. 
From Figure [B it is clear that the constraint is violated and cri becomes 
negative already after the first iteration. 

5.2. The Enhanced Newton- AI Method. In Figure l2l the results for the 
enhanced Newton- AI method described in Section 14.11 are shown for data 



set 2. In this case, the termination criterion o;^ < 10"^ for the line search 
is added. This is also the criterion responsible for stopping the iteration, 
and from Figure |2] it is clear that the enhanced method does find the correct 
minimum, despite the fact that the objective is non-convex. Introducing 
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Algorithm 2 Short Algorithm of AS and QN Methods 

Initialization: 

Set up A, X, y, iP, type of Hessian approximation 
resvar = varCy^ - X ■ (X^ ■ Xy^ • X^ • y'^) 

= iP- resvar 
k = 

Main loop: 

while \\DL\\l > 10'^ do 

if ((AS) and (||A^^ • DL\\l < 10'^)) then 
if (no active constraints) then 

break; 
end if 

Compute A by (ED]) 
if (A > 0) then 

break; 
end if 

update A and N 
end if 

k^k+l 

Compute y by (12]) 
Compute P by dH) 
Compute DL by 

Compute Hessian depending on type of Hessian approximation: 

(5^ by Algorithm [B, H'' or (AI'^ by (fT3l)) 

Newton-AI or Active-Set Method Step 

Compute direction p'^ from (fT4)) or (fT9l) 

Find step length a^: 

if (Of < 10-5) then 

break; 
end if 

if ((AS) and (new active constraints)) then 

update A and N; 
end if 
end while 

Evaluate j3 by 1^ 
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Figure 1 . The standard Newton- AI method for data set 2 
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Figure 2. The enhanced Newton- AI method, including Hne 
search, for data set 2 

the Hne search procedure does not change the performance of the method 
for unconstrained problems, i.e. for data set 1. In this case, the standard 
Newton- AI scheme and the enhanced version produce the same iteration 
sequences and the same (quite acuurate) results. 

The Enhanced Quasi-Newton method. We now present results for the 
enhanced quasi-Newton method for data sets 1 and 2, and compare these 
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iter. 


Quasi-Newton 


Newton-AI 




T / \ 

L(cri, 0-2) 




T / \ 

L(cri, 0-2) 


1 


2964.3089, 29643.089 


-4239.3313 


2964.3089, 29643.089 


-4239.3313 


2 


4810.993, 17002.992 


-4225.8047 


^ f X < /"\ /X /X ^ 1 ^T/^/^/^ /x /x /-N 

4810.993, 17002.992 


-4225.8047 


3 


4615.991, 23783.778 


-4222.1107 


4761.311, 20046.156 


-4218.9801 


4 


4749.718, 21684.535 


-4219.2220 


A fx y^ /"\ 1 ^7^7 /x /' /-N fx fx ^ 

4860.177, 20628.874 


-4218.8175 


5 


4855.487, 20316.669 


-4218.8628 


^ fx /' /X /-N fx /"\ /' A A ^ f X 

4869.628, 20644.358 


-4218.8173 


6 


4845.639, 20679.226 


-4218.8178 


^ fx fx ^ A /' /"\ /' A A /' ^ '\ 

4868.546, 20644.651 


-4218.8173 


7 


4oJO.JJZ, zUd4 /.I / / 


-4zlo.ol /4 






8 


4862.895, 20644.420 


-4218.8173 






Q 


4868.459, 20644.221 


-4218.8173 






10 


4868.738, 20644.552 


-4218.8173 







Table 1. Convergence histories for data set 1, enhanced quasi- 
Newton and Newton-AI schemes 
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Figure 3. The enhanced quasi-Newton method, including line 
search, for data set 1 

results to those of the enhanced Newton-AI scheme. In Tables [I] and [21 the 
convergence histories for the two data sets are compared. 

From Table [B we draw the conclusion that for data set 1, where the 
minimum is clearly defined and inside the search region, the Newton-AI 
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Figure 4. The enhanced quasi-Newton method, including Hne 
search, for data set 2 



iter. 


Quasi-Newton 


Newton-AI 


O-U 0-2 


L(cri, 0-2) 


o-u 0-2 


L(cri, 0-2) 


1 


2964.3089, 29643.089 


-4330.3621 


2964.3089 , 29643.089 


-4330.3621 


2 


1009.792, 29691.069 


-4328.7130 


1009.792, 29691.069 


-4328.7130 


3 


109.098, 29685.878 


-4327.6635 


109.098, 29685.878 


-4327.6635 


4 


27.693, 29690.757 


-4327.5708 


27.693, 29690.757 


-4327.5708 


5 


0.552, 29699.832 


-4327.5458 


10.663, 29693.972 


-4327.5545 


6 


0.125, 29699.982 


-4327.5455 


4.466, 29695.385 


-4327.5490 


7 


0.049, 29700.008 


-4327.5454 


1.777, 29696.043 


-4327.5468 


8 


0.011,29700.021 


-4327.5454 


0.520, 29696.359 


-4327.5458 


9 


0.002, 29700.024 


-4327.5454 


0.216, 29696.437 


-4327.5455 


10 


0.001, 29700.024 


-4327.5454 


0.065, 29696.476 


-4327.5454 


11 






0.028, 29696.485 


-4327.5454 


12 






0.009, 29696.490 


-4327.5454 


13 






0.004, 29696.491 


-4327.5454 


14 






0.002, 29696.492 


-4327.5454 



Table 2. Convergence histories for data set 2, enhanced quasi- 
Newton and Newton-AI schemes 
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method converges faster that the quasi-Newton scheme. In this case, the 
average information matrix is a good approximation of the Hessian, and 
inverse BFGS updating formula provides no improvement. 

The results in Table El show that for data set 2, the convergence rates of 
the two methods are different, too. In this case, the convergence properties 
are determined not only by the line search procedure which is the same for 
the two schemes, but also by the method of approximation to the Hessian. 
We note that the quasi-Newton scheme actually produces faster solution. 
The first three iterations of both methods are identical since the objective 
function is non-convex in the large region covering the initial guess. Ac- 
cording to the Algorithm [U the quasi-Newton scheme exploits the average 
information matrix as Hessian approximation. After the third iteration the 
curvature condition (fT5|) holds, so that the in quasi-Newton method the in- 
verse BFGS approximation to the Hessian is used. As a result, this method 
produces the longer step toward minimum which causes the faster conver- 
gence of the quasi-Newton scheme than the Newton- AI scheme. Since the 
objective function is extremely flat at the solution in the cr2-direction, the 
accuracy of both solutions is similar and is accepted as a correct one, while 
the exact solution (cri = 0) is produced only by the active-set methods. 

Generally, the convergence of the unconstrained methods applied to the 
problems with optimum at constraint is quite slow in the neighborhood of 
the solution which is due to the inefficiency of the line search procedure. 
In Figures [3] and ID we show the results for the quasi-Newton scheme in 
graphical form. 

5.3. The Active-Set Method. Finally, we present experiments where the 
active-set method described in Section |43]is used. Here, we use the termi- 
nation criterion WN^DLW^ < 10~^, additionally the criterion < 10"^ for 
the line search is used. As it was for unconstrained methods, this criterion 
is responsible for stopping the iterations. Figure |6] and |7] show the con- 
vergence history for the active set method using the average information 
approximation to the Hessian for data set 1. 

In this case, the optimum is located inside the search region. We have 
verified that the results in Figure [6] are identical to the corresponding results 
for the standard Newton- AI scheme, as expected. When the quasi-Newton 
scheme with the BFGS formula is used in the active-set scheme for data 
set 1, the convergence history is slightly different from the enhanced quasi- 
Newton scheme described above, see Figures [6] and [3] . This difference is 
due to the way of the approximation to the Hessian: in one case the inverse 
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of the Hessian is approximated, while in the other case the inverse of the 
inverse of the Hessian is computed. 

For the active-set method it is interesting to study the performance for 
data set 2, where the optimum is located at one of the constraints. Since the 
objective function is non-convex, the average information matrix is used as 
approximation to the Hessian. As a result, the methods with the average 
information and quasi-Newton approximations of the Hessian produce the 
identical solutions. Figure |5] shows the result for the active-set scheme 
where the average information of the Hessian are used. This results should 
be compared to Figures |2] and IH where the corresponding unconstrained 
schemes enhanced with a line search procedure are used. The convergence 
histories for the active-set methods are shown in Tabled 

It is clear that the active-set methods produces faster convergence. The 
step lengths is each iteration are larger, and the approximations change 
more rapidly in the first iterations. In practice, the minimum is reached 
after 3-4 iterations. Moreover, the active-set methods produce more accu- 
rate solution than the unconstrained methods. The general conclusion is 
that the active-set approach using the average information approximation 
for the Hessian is the most robust scheme. 
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Iteration numbe 



Variance componenti x 1 o'' Variance componGnt2 




Figure 5. Active-set method with AI approximation for the 
Hessian, for data set 2 



iter. 


Quasi-Newton 


AI 




L(cri, 0-2) 


o-i, 0-2 


L(cri, 0-2) 


1 


2964.3089, 29643.089 


-4330.3621 


2964.3089, 29643.089 


-4330.3621 


2 


0.000, 29715.858 


-4327.5456 


0.000, 29715.858 


-4327.5456 


3 


0.000, 29681.799 


-4327.5453 


0.000, 29681.799 


-4327.5453 


4 


0.000, 29681.799 


-4327.5453 


0.000, 29681.799 


-4327.5453 



Table 3. Convergence histories for the active-set method us- 
ing quasi-Newtion and average information approximation for 
the Hessian, data set 2 
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Figure 6. The active-set method, AI approximation for the 
Hessian, for data set 1 
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Figure 7. The active- set method with quasi-Newton approxi- 
mation for the Hessian, for data set 1 
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6. Conclusions 

In this paper we consider optimization procedures for maximizing the 
log-HkeUhood for REML models used in a QTL mapping setting. We 
first show that the standard Newton- AI scheme fails for a problem where 
the optimum is located at a constraint boundary. Then we show how this 
scheme can be modified to produce a correct solution also in these cases by 
including a simple line search procedure. We also introduce an enhanced 
quasi-Newton scheme, where the line search procedure is included and 
where the average information matrix is used both as a starting guess and 
at locations where the curvature criterion does not hold. A strong side of 
this method is that for non-convex functions a better approximation of the 
Hessian than the average information matrix can be computed. Generally, 
we want to point out that the unconstrained methods considered in this 
framework are sensitive to the choice of the approximation to the Hessian. 

As a second step we describe how an active-set method, which automati- 
cally includes the constraints, can be used for solving the REML optimiza- 
tion problems. For the data set where the optimum is located at one of the 
constraint boundaries, the cpu-time is reduced by approximately a factor 
of two compared to the corresponding unconstrained method. 

In our numerical experiments we used the termination criterium ||/)L||2 < 
10"^ or ||A/^^DL||2 < 10~^ which turned out to be unnecessary low. 

The overall conclusion is that for problems of the type considered here, 
the active-set method is robust, and should be preferred compared to using 
an unconstrained method. Moreover, the method using the average infor- 
mation matrix for the Hessian approximation gives fast and robust results 
when optimum is located inside the feasible region. 
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